On Pattern Occurrences in a Random Text

نویسندگان

  • Ioannis Fudos
  • Evaggelia Pitoura
  • Wojciech Szpankowski
چکیده

Consider a given pattern H and a random text T of length n. We assume that symbols in the text occur independently, and various symbols have different probabilities of occurrence (l.e., the so called asymmetric Bernoulli modeQ. We are concerned with the probability of exactly T occurrences of H in the text T. We derive the generating function of this probability, and show that asymptotically it behaves as anrpfi-r-l, where a is an explicitly computed constant, and PH < 1 is the root of an equation depending on the structure of the pattern. We then extend these findings to random patterns.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ON PATTERN OCCURRENCES IN A RANDOM TEXTApril

Consider a given pattern H and a random text T of length n. We assume that symbols in the text occur independently, and various symbols have diierent probabilities of occurrence (i.e., the so called asymmetric Bernoulli model). We are concerned with the probability of exactly r occurrences of H in the text T. We derive the generating function of this probability, and show that asymptotically it...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Worst Case Efficient Single and Multiple String Matching in the RAM Model

In this paper, we explore worst-case solutions for the problems of single and multiple matching on strings in the word RAM model with word length w. In the first problem, we have to build a data structure based on a pattern p of length m over an alphabet of size σ such that we can answer to the following query: given a text T of length n, where each character is encoded using log σ bits return ...

متن کامل

Frequency of Pattern Occurences in a (DNA) Sequence

Consider a given pattern H and a random text T oflength n. We assltme that consecutive symbols in the texl are generated either independently or with a Markovian dependency, i.e., we stItely both the so called Bernoulli model and the Markovian model. OUf goal is to assess the limiting distribution of the frequency of the pattern occurrences ln a random sequence. Overlapping copies of a pattern ...

متن کامل

Efficient String Matching with k Mismatches

Given a text of length n, a pattern of length m and an integer k, we present an algorithm for finding all occurrences of the pattern in the text, each with at most k mismatches. The algorithm runs in 0{k[mlQgTn + n) time. 1. INTEODUCTION The problem of string matching xuith k misTnatchss is defined as follows. Suppose we are given a text of length n , a pattern of length m and an integer k . Fi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Lett.

دوره 57  شماره 

صفحات  -

تاریخ انتشار 1996